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Abstract. Tree rearrangement operations typically induce a 
metric on the space of phylogenetic trees. One important property 
of these metrics is the size of the neighbourhood, that is, the num- 
ber of trees exactly one operation from a given tree. We present an 
expression for the size of the tbr (tree bisection and reconnection) 
neighbourhood, thus answering a question first posed in [1]. 



1. Introduction 

Phylogenetic trees are a commonly used tool for representing the 
relationships between species in an evolutionary system, especially in 
evolutionary biology. A central task in the study of these trees is 
to determine which among a set of hypothesised trees gives the best 
explanation of empirical data. However, finding the trees that optimize 
some criterion is often computationally prohibitive because of the large 
number of trees to be checked. An approach that avoids this is a 
heuristic hill-climbing algorithm that searches tree space using tree 
rearrangement operations |5] E] . That is, at each iteration the optimal 
tree within one rearrangement operation is chosen as the input for the 
next step, and the algorithm is thus guaranteed to find a local optimum. 

Loosely speaking, a tree rearrangement operation breaks a tree into 
two contiguous parts, and rejoins these parts to form a new tree. 
Among the three tree rearrangement operations of interest, namely NNI 
(nearest neighbour interchange), SPR (subtree prune and regraft) and 
tbr (tree bisection and reconnection), each induces a distinct metric 
on the space of unrooted trees. Several properties of these metrics are 
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important for understanding the efficiency of the algorithm outlined 
above. Our interest in this paper is in the size of the tbr neighbour- 
hood, that is, the number of trees that can be reached from a specified 
starting tree via a single TBR operation. 

A phylogenetic tree is an unrooted binary tree in the graph theoretic 
sense, with a unique label attached to every leaf, or vertex of degree 
one. We denote by £F n the collection of all phylogenetic trees whose 
leaves are the set {1, . . . , n}. 

For a tree where n > 4, Robinson [7] showed that the nni 

neighbourhood iV NNI (T) has size exactly equal to 2n — 6, that is, 

|AW(T)| = 2n-6, 

while Allen and Steel [1] proved that 

|AW(T)| = 2(n-3)(2n-7), 

where iV SPR (T) is the SPR neighbourhood of T . It was also demon- 
strated in [1] that the size of the tbr neighbourhood is dependent on 
the shape of T. More recently, in [3j, the bounds 

2 16 
cn 2 \ogn + 0(n 2 ) < \N TBR (T)\ < -n 3 - An 2 H n + 2 

3 3 

were shown to hold for all n > 4, with the upper bound being met with 
equality if and only if T is a caterpillar, that is, a phylogenetic tree in 
which every non-leaf vertex is adjacent to a leaf. 

The rest of this paper is divided into three sections. Section [2] con- 
tains the definitions required to follow the main content of the paper. 
In Section [3J, we relate the number of possible rearrangement opera- 
tions for T to the size of the neighbourhood, and use this to reprove 
Allen and Steel's [1] result for the SPR neighbourhood and to obtain 
an expression for the tbr neighbourhood dependent on the tree shape. 
In Section HI we characterise the trees that respectively maximise and 
minimise the size of N TBR (T) for all binary tree spaces 27 n . These char- 
acterisations are also extended to reprove the tight upper bound given 
in [3], and to further prove an asymptotically tight lower bound. 

2. Definitions 

Before giving formal definitions of each rearrangement operation, we 
introduce some useful terminology. Given a tree T and a subset X of 
the leaf set of T, the restriction of T to X, or T\X, is the minimal 
subtree of T connecting the leaves in X, with all vertices of degree 
two supressed. A split X\Y of a tree is a bipartition of the leaf set 
such that T\X and T\Y are vertex disjoint subtrees of T. Further, if 
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X' C X,Y' C y, then we call a partial split of T. A split is 

trivial if one of its parts contains only one leaf. The set of all splits of 
T is denoted by X(T). If T is a tree with the leaf set Z, then a cluster 
of T is a set X such that X\(Z - X) e £(T). If |A| = 2, then we call 
X a cherry. 

A binary tree is a tree whose vertex degree is either one or three. 
Note that a binary tree with n > 3 leaves has 2n — 2 vertices in total 
and 2n — 3 edges, an observation that will be used throughout this 
paper. 

Although nni was the point of departure for the study of these oper- 
ations |7J, we will first define tbr, being the most general of the three. 
A tbr operation on a binary phylogenetic tree T involves deleting 
some edge e from T (bisection), and subsequently inserting a new edge 
/ so that the resulting tree T' is distinct from T (reconnection). Since 
we require T 7 to be binary, it is necessary to subdivide an edge in one 
(in the case that the other component is an isolated labelled vertex) 
or both components created in the bisection stage before inserting the 
new edge. An example is given in Fig. [TJ We can transform 71 into 7~2 
by first deleting the edge e from 71, and then adding the new edge /. 
To check that there has been no other change to the tree's structure, 
note that deleting e from 71 gives the same forest as deleting / from 



1 6 1 6 




2 345 3 254 



Figure 1. Two trees 71,72 G % that are one tbr op- 
eration apart. 

For a binary tree T, we define the set O tbr {T) to be all possible 
tbr operations 9 that can be applied to the tree T . An important 
point to note here is that for distinct 9i,9 2 G O tbr (T), we may have 
#i(T) = 9 2 (T). The reason for this is that an operation 9 G O tbr {T) 
is not specified solely by the output tree 9(T), but also by the edge e 
that is deleted from T in the bisection stage of 9. 

Observe that for any two distinct trees T, T' G ^x, there is a tbr 
operation 9 G O tbr (T) for which 9(T) = T' if and only if there is some 
split X 1 \X 2 e E(T) D E(T') such that T\X { = T\X i for all % G {1,2}. 
In this case, X\\X 2 is the split induced by 9. To demonstrate this, if the 
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edges e and / have respectively been deleted and inserted in the tbr 
operation that changes T into T 7 , then the forest obtained by deleting 
e from 7~ must be identical to the forest obtained by deleting / from 
T'. This provides not only the common bipartition of the leaf set, but 
also the common subtrees induced by each part of this bipartition. 

Spr is a special case of tbr in which there is less freedom at the 
reconnection stage. Let T be a binary tree, and let 9 G O tbr (T) be a 
tbr operation on T in which the edge e is deleted, and let Xi\X 2 be 
the split of T induced by e. Then 9 is an SPR operation for T if and 
only if, without loss of generality, T\(X 2 U xi) = 9{T)\(X 2 U Xi) for 
some x\ G X\. Moreover, if this holds then in fact the same property 
holds for all x\ a X\. 

The significance of this condition is that one of the components 
formed in the bisection of T, in this case T\X 2 , is treated as a rooted 
subtree, and is then regrafted so that this rooting is preserved with 
respect to the other component. We say that we have pruned T\X 2 
from T, and regrafted it to form T' . 

The previous example (refer to Fig. [T]) does not represent an SPR 
operation, since neither component obtained by deleting e from 7i can 
be regrafted to the other to form T 2 . By making a subtle change, in 
particular by exchanging the labels 4 and 5 on 72, we get a tree % that 
can be obtained from 7i by a single SPR operation. This example is 
shown in Fig. [2j 



1 6 1 6 




2 345 3 245 



FIGURE 2. Two trees 71,73 G % that are one SPR op- 
eration apart. 

Nni operations are tbr operations in which the reconnection is still 
more restricted than for SPR. Let T be a phylogenetic tree, and let 
9 G O tbr (T) be an SPR operation in which T\Y is pruned from T 
and regrafted to form T' = 9{T). We say that 9 is an nni operation 
if and only if there is some cluster Z ^ Y of T such that we can 
form T' from T by swapping the subtrees T\Y and T\Z. In this case, 
T\Y and T\Z can be seen as adjacent in some sense, as shown by the 
schematic diagram in Fig. [3j Note that T' can be obtained from T 
by four distinct nni operations, namely pruning one of the subtrees in 
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{T\X, T|y, T\Z, T\W} and regrafting it in an appropriate way. Indeed, 
if 6 is an nni operation for T, then there are precisely four distinct 
operations 6' e Nm (T) such that 9{T) = O'iT). The possibility that 
two distinct operations can result in the same tree lies behind the main 
lemma (Lemma 13. ip in Section [3j 



X\ > ^ nr < \W X[ > ^ < ]W 

Y Z Z Y 



Figure 3. Two trees that are one nni operation apart. 

Extending our earlier notation for tbr to both SPR and nni, we have 

CW(T) c O SPR (T) c O tbr (T) 

for any tree T . The tbr neighbourhood of T is the set 

N TBR {T) = {9{T) : 9 e O tbr {T)}. 

That is, iV TBR (T) is the set of all trees that are precisely one tbr 
rearrangement operation from T. The nni neighbourhood N mi (T) 
and the SPR neighbourhood N SPR (T) are defined similarly. Clearly, the 
elements in these neighbourhoods are dependent on the operation in 
question, and we have the corresponding nesting property as above. 
More explicitly, 

AW(T) c N SPR (T) c N TSR (T). 

3. Neighbourhood Sizes 

The approach used by Allen and Steel [1] to determine both the size 
of the SPR neighbourhood and the upper bound on the size of the tbr 
neighbourhood was to count directly the number of trees that can be 
obtained from T via a single operation. While this seems the most 
natural approach, there is a fundamental barrier to performing this 
enumeration that we alluded to briefly in Section |2j This is the fact 
that some operations in O tbr (T) may be redundant. That is, there 
may be distinct elements 6\, Q% G C TB r(T) for which 

9i(T) = 6 2 (T). 

This potentially leads to counting some trees in iV TBR (T) more than 
once. If we can determine precisely which operations in O tbr (T) output 
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the same tree, then we can relate the size of the tbr neighbourhood 
to the number of operations on T. 

It transpires, as the next lemma shows, that the only redundant tbr 
operations are all nni operations. 

Lemma 3.1. Let 9, 9' E TBn (T) be distinct tbr operations. IfO(T) = 
6'{T), then 9 e £> NNI (T). 

Proof. Suppose that A\B is the split of T induced by 9, and that A'\B' 
is the split induced by 9'. Then A\B ^ A'\B' as otherwise 9{J~) must be 
distinct from 9'(T). Hence we may assume that A C A', and hence also 
B' C B. Since T\A' = 9(T)\A', we have immediately that 9 E O SPR (T). 
Let Aq — A, Ai, . . . , Ak — A' be clusters of T such that 

(i) Ai\B' is a partial split of T; and 

(ii) A i+ i is a minimal cluster of T that contains A^. 



Ao\> 



I I 

A 1 -A A k - A k _ 



Figure 4. The tree T in the proof of Lemma [3. 1[ 



The generic structure of T is depicted in Fig. HJ where / and e 
are two edges and whose removal will result in the split A'\B' and 
(A k — Ak-i)\(X — (Ak— Ak-i)), respectively. Now consider the operation 
9. If k > 3 then in order for T\A' = 9(T)\A' to hold, we must regraft 
the pruned subtree T\A in the same place, but this implies T = 9(T), a 
contradiction. If k — 2, to ensure T\A' = 9(T)\A', we must regraft T\A 
to the edge e or /. In other words, 9(T) is obtained from T by swapping 
either the subtrees T\A and T\B', or T\A and T\(A 2 — Ax), from which 
it follows that 9 is an nni operation. Now it remains to establish the 
case k — 1. To this end, we can further assume that \Ax — A \ > 1, 
because otherwise T\A' = 9(T)\A' implies the contradiction T = 9(T). 
Therefore the generic structure of T in this case can be represented as 
in Fig. where C\ U C2 = A\ — A . Using the constraint T 7^ 9(T) 
and T\A' = 9(T)\A' again, we can assert that T\A must be regrafted 
to either e\ or e 2 . This completes the proof as in both cases 9 is an nni 
operation. □ 
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Ao\> 1 <\B> 

C7x[ > 61 I 62 < \C 2 

Figure 5. The tree T for the case k = 1 in the proof of Lemma [3.11 



As a consequence of Lemma I3.1[ we can express the sizes of both 
the SPR and the tbr neighbourhoods in terms of the number of each 
operation for a tree and the size of the nni neighbourhood. 

Lemma 3.2. For T £ where n > A, we have 

\N SPR (T)\ = \O SPR (T)\-3\N NNl (T)\, 

and 

\N TBR (T)\ = \0 TBR (T)\-3\N Nm (T)\. 

Proof. This follows from Lemma 13.11 and the observation in Section [2] 
that, if 6 is an nni operation for T, then there are precisely four distinct 
operations 9' £ O mi (T) such that 6(T) = 9'{T). □ 

This lemma forms the basis of the two key results for this section. 
Both the number of distinct SPR operations and the number of distinct 
tbr operations for any given tree can be found relatively easily. We 
proceed with the SPR case first. 

Theorem 3.3. For a tree T £ 2F n where n > 4, we have 

\O sm (T)\ =4(7i-2)(n-3). 

Proof. We consider two possible SPR operations on T, firstly those that 
induce a trivial split on T, and secondly those that induce a non-trivial 
split. In the first case, there are n possible leaves that can be pruned 
from T, and for each leaf x there are In — 6 edges in T — x to which 
we can reconnect it so that the resulting tree is different from T. 

In the second case, suppose that the non-trivial split is A\B, with 
\A\ — a and \B\ = b. If we choose T\A to be the pruned subtree, then 
there are 2b — 3 edges to which we can regraft T\A. However, one of 
these results in the same tree as we began with, namely T. Thus there 
are 26 — 4 such distinct operations. Similarly, if we choose T\B as the 
pruned subtree, then there are 2a — 4 possible SPR operations. Thus 
there are 2n — 8 distinct SPR operations for each of the n — 3 non-trivial 
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splits of T. Hence 

|0 SPR (T)| = n(2n - 6) + (n - 3)(2n - 8) 
= 4(n-2)(n-3). 

□ 

As a corollary to this theorem, we obtain the result of Allen and 
Steel's [1] for the size of the SPR neighbourhood. The proof is omitted, 
as it follows trivially from the size of nni neighbourhood (see Section [TJ, 
Lemma [3.21 and Theorem 13.31 

Corollary 3.4 (Theorem 2.1, |T]). For T G SF n where n > 4, we have 
|iV S p R (T)| = 2(n-3)(2n-7). 

We require one further idea before tackling the tbr problem. For a 
binary tree T, we define T(T) by 

where the sum is taken over all non-trivial splits A\B of T. This quan- 
tity is closely related to the Wiener index which arose out of chemical 
graph theory [2J. 

Theorem 3.5. For a tree T G 2? n where n > A, we have 
\O tbr (T)\ = 4r(T) - 4(n - 2)(n - 3). 

Proof. We consider two possible tbr operations on T, firstly those that 
induce a trivial split on T, and secondly those that induce a non-trivial 
split. The argument in the first case is identical to that given in the 
proof of Theorem 13.31 and gives n(2n — 6) distinct tbr operations. 

Now, let A\B be some non-trivial split of T induced by the edge 
e. Then when we bisect T by deleting e, there are 2\A\ — 3 edges in 
one component of the resulting forest and 2\B\ — 3 edges in the other. 
Hence, there are (2\A\ — 3)(2|5| — 3) ways to choose an edge from each 
of T\A and T\B. Precisely one of these results in re-forming T. Hence, 
by taking a sum over all non-trivial splits A\B of T, we get 

\O tbr {T)\ = n(2n + K 2 ^ " 3 X 2 I S I " 3 ) " ^ 
= 4T(T) -4(n-2)(w-3). 

□ 

This brings us to the following key result in this paper, which relates 
the size of the tbr neighbourhood of a phylogenetic tree to its shape, 
and provides an effective way to calculate this quantity. Also, as we 
will see in the next section, Theorem 13.61 gives us enough traction to 
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characterise the trees that respectively maximise and minimise the size 
of the tbr neighbourhood. 

Theorem 3.6. For T G where n > 4, we have 

\N TBK (T)\ = 4r(T) - (4n - 2){n - 3). 

Proof. This follows immediately from the size of the nni neighbour- 
hood, Lemma [3.21 and Theorem 13.51 □ 



4. Characterisations of the Extremal Cases 

Since the size of the tbr neighbourhood for T is dependent on both 
the number of leaves in T and the shape of T, it makes sense to char- 
acterise which tree shapes give the extreme values for this size. As a 
consequence of Theorem 13. 6[ it suffices to determine which tree shapes 
maximise and minimise the size of T(T) over all trees in £? n for some n. 
We begin with the easier case, that is, finding the trees that maximise 

r(T). 

Lemma 4.1. Let T G £? n be a tree such that T(T) > r(T') for all 
T' G 3? n . Then T is a caterpillar. 

Proof. Suppose that {xi,x 2 } and {23,24} are cherries of T, and let 
the sets Yi,...,Y& partition the remaining leaves so that T can be 
represented as in Fig. El 



x 2 



Yi Y, 



■ £4 



£3 



Figure 6. The tree T in the proof of Lemma [4. 1[ 



Setting yi = |Y$|, it will suffice to show that yi = 1 for all i. As- 
suming otherwise, let % G {1, . . . , k} be the smallest index such that 
Hi > 1. Now we form a second tree T' by moving the subtree T\X% to 
the position adjacent to x\. The tree T' is shown in Fig. [71 
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Now, calculating the difference between T(T) and r(T'), we find 
that 

i-1 i-1 

r(T) - r(T') = + 2)(n - 3 - 2) - J> 4 + j + l)(n - M - j - 1) 

3=0 j=0 

i-1 

= + 2)« " (J + 2) 2 - + j + l)n + ( Vl + j + l) 2 ] 

j=0 

i-1 

= {l-yi)Y,(n- yi -3-2j) 

j=0 

= z(l - -yi-i-2). 

Since ?/j > 1 for all j, we have the inequality yi + (i — 1) < n — 4, 
from which n — yi — i — 2 is strictly positive. Together with the 
assumption ?/j > 1, we conclude that T(T) < r(T'), a contradiction as 
required. Therefore ?/j = 1 indeed holds for alH G {1, . . . , k}, and T is 
a caterpillar. 




Figure 7. The tree T' in the proof of Lemma [4. 11 

□ 

Recall that the exact upper bound on the size of N TBR (T) for a tree 
T ,%i was proven in [3] by induction on n. Corollary 14.21 confirms 
this result using a different approach. 

Corollary 4.2 (Theorem 2.1, [3]). The tree T G ,% t maximises the 
size of the tbr neighbourhood over if and only if T is a caterpillar. 
Moreover, if T is a caterpillar then 

2 16 
\N TBR {T)\ = -n 3 - An 2 + —n + 2. 

Proof. The first part of the corollary follows from Lemma 14. 1[ To find 
the size of the neighbourhood, we apply Theorem 13.61 from which we 
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have 



AWT)| = 4r(T) - {An - 2)(n - 3) 



n-2 



= 4^ K n ~ *) - ( 4ri - 2 )( n - 3 ) 



i=2 



= -n 3 - An 2 H n + 2. 



□ 



Characterising the trees that minimise the size of the tbr neigh- 
bourhood relies heavily on Lemma 14.31 Before proving this, we give 
an example of the simplest case of this lemma. Referring to Fig. [HJ 
suppose that the sizes of the pendant subtrees labelled by Xi, . . . , X± 
are X\ , . . . , x \ respectively. If this tree has a minimal value for T(T), 
then since T(T) is the sum of |A| • \B\ over all non-trivial splits A\B, 
we must have 

(x\ + x 2 )(x 3 + Xi) < min{(a;i + x 3 )(x 2 + x 4 ), (x\ + x 4 )(x 2 + x 3 )}. 

Assuming without loss of generality that x\ is the smallest of the four 
quantities, it is easy to show that x 2 is the next smallest. Lemma [4.31 
extends this observation to a more general result. 



Figure 8. A tree illustrating the simplest case of Lemma [4.31 

Lemma 4.3. Let X = {l,...,n}, and let T G ^ n be such that 
r(T) < r(T') for all T G Further, for some k > let 

X\, . . . , A4, Yi, . . . , Yfc partition X such that the following hold: 



(i) Xi\(X - X t ) e E(T) /or oH i g {1, ... , 4}; 

(ii) Yi\(X - Y) e E(T) /or oM i 6 {1, . . . , A;}; and 

(hi) - Ai) G E(T) /or aW i G {0, . . . , w/iere A = Ij U 

T/ien without loss of generality we have x\ < x 2 < £3 < X4, where 

Xi I Xi I . 
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Proof. Swapping the subscripts of Xi if necessary, we can assume x\ < 
x 2 < x 3 . Supposing that the lemma is false, we have x 2 > x 4 . Then 
either x\ = x 3 , and so X\ > x 2 > x 3 > x 4 , contradicting our assumption 
that the lemma is false, or x\ < x 3 . 



*i|> 



<x 4 



X 2 Y 1 



Figure 9. The tree T in Lemma [4731 

Figure |9] shows the general structure of a tree T that satisfies the 
conditions of the lemma. Let 7i be the tree obtained from T by swap- 
ping the subtrees labelled by X% and X 3 , and let % be similarly ob- 
tained by swapping the subtrees T\X 2 and T\X 4 . Let ?/; = \Yi\, and 
fe = 0, bi — bi_i + yi. Then we have 



T(T) - r(7i) = + x 2 + bj)(n - x x — x 2 - bj) 

3=0 

k 

- ^(^2 + x 3 + bj)(n - x 2 — x 3 — bj) 

3=0 

k 

= (x 3 - xi) 2 2J bj — (k + l)(n — x\ — 2x 2 - x 3 ) 
. 3=0 

Since we assume that T(T) < r(7i), we get 

k 

2 bj — (k + l)(n — 2xi — x 2 — x 4 ) 



r(r) - r(r 2 ) = (x 4 - x 2 ) 



> (X4 — X2 

X4 — x 2 
X 3 - Xi 

> 0, 



3=0 
k 



2 ^ bj — (k + l)(n - xi - 2x 2 - x 3 ) 
. i=o 

(r(T) - r(7l)) 



contradicting the fact that T(T) < r(72). 



□ 
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Applying Lemma [4.3[ we can completely characterise those trees T 
that minimise the size of T(T), and therefore those trees that minimise 
the size of the tbr neighbourhood. 

Lemma 4.4. Let X = {1,2, ••• , n} for some n = Yli=o a i2\ where 
ati G {0, 1} for < i < k and a k = 1. Let fa = i Yli=j ' ■ Let T G 
2T n such that r(T) < r(T') for all V G X- Then for all < j < k- 1 
there is a partition X 1: . . . , Xp. of X into fa disjoint subsets such that 
following properties hold: 

(i) X P \(X - X p ) G £(T) /or a/U < p < fa; and 

(ii) \X P \ = 2 j for all 1 < p < fa. 

Proof. For j = 0, this holds trivially. We assume that for some < 
j < k — 1, the partition X\, . . . , X^. of X satisfies the conditions of the 
lemma. 

Suppose that for 1 < p < q < f3j, there is no set Y that contains 
either X p or X q such that Y\(X - Y) G S(T) and |F| = 2- ?+1 . Then 
we can apply Lemma 14.31 to find a tree T 1 for which T(T') < T(T). 
Hence, for m such that 2m < fa, there are disjoint subsets X[, . . . , X^ 
of X such that X' p \(X - X' p ) G S(T) and \X' p \ = 2 J+1 . 

There are two cases to consider. Suppose firstly that 2m = fa — 2. 
Then there is some 1 < p < fa such that X p is not contained in some Y, 
where Y\(X -Y) G S(T) and |F| = 2 J ' +1 . We can then use LemmaiJ] 
again to show that if X' B = X p U Xr. , then XL. \(X-X' g . ) 6 E(T). 
Since m + 1 = we have the required partition. 

On the other hand, if 2m = fa — 1 then we can use Lemma 14.31 to 
show that there is some 1 < p < m such that, if X' R = Xr. UX' then 
X'p. +i \(X — X'p j+i ) G S(T). Again, this gives the required partition, 
completing the induction. □ 

The question now is what these trees look like. In some sense, the 
trees that minimise the size of T(T) are maximally balanced, although 
we must carefully define what we mean by this. The only sizes of n for 
which an unrooted binary tree can be truly balanced, or perfect, are 
n = 2 h or n = 3 • 2 h , where the tree is vertex-transitive with respect 
to the leaves and we have either two-fold symmetry about an interior 
edge of the tree or three-fold symmetry about an interior vertex. For 
values of n other than those which admit a perfect tree, we necessarily 
lose the global property of leaf-transitivity. 

A tree Tg where 3 ■ 2 k < n < 3 ■ 2 k+1 for some k > 0, is called 
complete if and only if 

(i) there is a cluster Y of T with \Y\ — 2 k+1 ; and 
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(ii) for all clusters Y with 2 < \Y\ < 2 k+1 , there is a bipartition 
Y\ , Y 2 of Y such that both of Yi , Y 2 are clusters of T, and such 
that |Yi| = 2 j and 2 i_1 < |Y 2 | < 2 i+1 for some j. 

Intuitively, for each cluster Y with \Y\ being a power of 2, the pendant 
subtree T\Y is perfectly balanced. For more details on complete trees 
and a generalization of completeness to trees with arbitrary vertex 
degrees, see pTJ. The trees in Lemma 14.41 are precisely the complete 
trees in the space from which we obtain the next theorem. The 
proof is routine and omitted. 

Theorem 4.5. The tree T G minimises the size of the tbr neigh- 
bourhood over S? n if and only if T is complete. 

Let us continue towards finding the size of the tbr neighbourhood 
for complete trees. To this end, we introduce one additional notation. 
For each positive integer m, there exists a unique binary expansion 



m 



a'T 



where ct i G {0,1} for < i < k and a' k = 1. Let 



r(m) = 1 if a4_i = 1, and r(m) = otherwise. In particular, we have 
r(2 fc ) = for every k. 

Lemma 4.6. Let T G be a complete tree for some n = Z~2i=o ai ^ 1 ' 
where cti G {0, 1} for < i < k and a>k — 1. Then: 



fc-i 

r(T) = £ 




)( 2n -!H 



+ a i _i2 i (n-2 J ) 



+ (a fc _ 1 -l)2 fe - 1 (n-2 fc ^ 



Proof. We use the proof of Lemma [4.41 to obtain this result. For each 
of the partitions X%, . . . , Xp,, we take the sum of \X P \ ■ (n—\X p \). Note 
that Xp. contains a cluster of size 2 J if and only if rdXgJ) = 1. 

Consider a tree on n leaves where otk-i = 1 following the notation of 
Lemma [4.41 This gives 



fc-i 



r(T) = £ 



i=i 
fc-i 



ft 



fn - \X n 



P =i 



+ T(\Xp J \p(n-2i) 



J2 P (Pj ~ 1 + T(\Xp.\)) (n - + | • (n - 
i=i 



We also have from Lemma 14.41 that 

\X Pj \=n-2'(p j -l), 
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and hence rd-X^I) = 1 if and only if a,j-% = 1. Incorporating this into 
the above expression, we find 



fc-i 



r(T) = - !)( 2 ^ - 2 ^i) + «i-i 2, "(™ " 2J 



i=i 

fc-i 

£ 

i=i 



In the case that a.k-i — 0, the partition X%, . . . , Xp k _ 1 is a bipartition 
of the leaf set of T, and so we need only take the product \Xi \ ■ (n— \Xi |) 
once in the sum above. In other words, we need to subtract 2 



fc-i, 



n 



■)k-l\ 



from the formula. 



□ 



We conclude this section with two corollaries, the first of which gives 
an exact value for the size of the tbr neighbourhood for perfect trees, 
and the second an asymptotic lower bound on the size of this neigh- 
bourhood for complete trees. Both proofs follow from Lemma 14.61 and 
Theorem 13.61 

Corollary 4.7. Let T 6 S? n be a perfect tree. Then 
\N TBR {T)\ = n 2 (4k - ) + 22n - 6 



if n = 3 • 2 k 1 for some k, and 

\N TBR (T)\ = n 2 (4k - 13) + 22n - 6 
if n = 2 k for some k. 

Proof. In the first case, where n = 3 • 2 k ~ l , we have 

fc-i 

r(r) = £>(»- 

= n 2 {k - 1) - n(2 k - 2) 
n 2 1^-3)+ 2n > 
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and the result follows by applying Theorem 13.61 On the other hand, if 
n = 2 k then 

fc-2 2 

r(T) = 5>(» - + x 

= - 2 (k - - n(2 fe " 1 - 2) 

= n 2 ^A; - + 2n, 

and again applying Theorem 13.61 gives the required result. □ 

Corollary 4.8. Let T G ^ fee a complete tree. Then 

\N TBR (T)\=4n 2 [\og 2 n\+0(n 2 ). 

Proof. The proof is similar in nature to that for the previous corollary. 
If 3 ■ 2 k ~ l < n < 2 k+l for some k > 1, then we have 



fc-i 



r(T) = £ 



n - £ ai 2 l - 2 j \ \n + J2 ^ + ^- X T{n - 2 j ] 



j=l l \ i=o / \ i=Q 



fc-i fe-i /i-i \ 2 

= n 2 (A; - 1) - n(2 fc - 2) + £ ^.^(n - 2 J ) - £ £ op . 

j=l j=l \i=0 J 

However, we can obtain a bound for the final term of this expression 
by assuming that — 1 for alH 6 {0, . . . , k — 2}, giving 

fc-l (3-1 \ 2 fc-l 

E E* 2 " <E 22i 

i=l \i=0 / 3=1 

3 V ; 
= 0(n 2 ). 

Similarly, we have YujZi oij-i2^{n — 2 J ) = 0(n 2 ), as required. 

The other case, where 2 k < n < 3 • 2 k ~ 1 , follows in a similar manner, 
and we complete the proof by applying Theorem 13.61 □ 
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